Search Result

Select

Accurate object tracking algorithm based on distance weighting overlap prediction and ellipse fitting optimization

WANG Ning, SONG Huihui, ZHANG Kaihua

Journal of Computer Applications 2021, 41 (4): 1100-1105. DOI: 10.11772/j.issn.1001-9081.2020060869

Abstract （353）

PDF （2560KB）（299）

Save

In order to solve the problems of Discriminative Correlation Filter(DCF) tracking algorithm such as model drift, rough scale and tracking failure when the tracking object suffers from rotation or non-rigid deformation, an accurate object tracking algorithm based on Distance Weighting Overlap Prediction and Ellipse Fitting Optimization(DWOP-EFO) was proposed. Firstly, the overlap and center-distance between bounding-boxes were both used as the basis for the evaluation of dynamic anchor boxes, which can narrow the spatial distance between the prediction result and the object region,easing the model drift problem. Secondly,in order to further improve the tracking accuracy,a lightweight object segmentation network was applied to segment the object from background, and the ellipse fitting algorithm was applied to optimize the segmentation contour result and output stable rotated bounding box, achieving accurate estimation of the object scale. Finally, a scale-confidence optimization strategy was used to realize gating output of the scale result with high confidence. The proposed algorithm can alleviate the problem of model drift, enhance the robustness of the tracker, and improve the accuracy of the tracker. Experiments were conducted on two widely used evaluation datasets Visual Object Tracking challenge(VOT2018) and Object Tracking Benchmark(OTB100). Experimental results demonstrate that the proposed algorithm improves Expected-Average-Overlap(EAO) index by 2.2 percentage points compared with Accurate Tracking by Overlap Maximization(ATOM) and by 1.9 percentage points compared with Learning Discriminative Model Prediction for tracking(DiMP). Meanwhile, on evaluation dataset OTB100, the proposed algorithm outperforms ATOM by 1.3 percentage on success rate index and shows significant performance especially on attribute of non-rigid deformation. the proposed algorithm runs over 25 frame/s averagely on evaluation datasets which realizes real-time tracking.

Reference | Related Articles | Metrics

Select

Multi-level feature enhancement for real-time visual tracking

FEI Dasheng, SONG Huihui, ZHANG Kaihua

Journal of Computer Applications 2020, 40 (11): 3300-3305. DOI: 10.11772/j.issn.1001-9081.2020040514

Abstract （322）

PDF （2493KB）（305）

Save

In order to solve the problem of Fully-Convolutional Siamese visual tracking network (SiamFC) that the tracking target drifts when the similar semantic information interferers occur, resulting in tracking failure, a Multi-level Feature Enhanced Siamese network (MFESiam) was designed to improve the robustness of the tracker by enhancing the representation capabilities of the high-level and shallow-level features respectively. Firstly, a lightweight and effective feature fusion strategy was adopted for shallow-level features. A data enhancement technology was utilized to simulate some changes in complex scenes, such as occlusion, similarity interference and fast motion, to enhance the texture characteristics of shallow features. Secondly, for high-level features, a Pixel-aware global Contextual Attention Module (PCAM) was proposed to improve the localization ability to capture long-range dependence. Finally, many experiments were conducted on three challenging tracking benchmarks:OTB2015, GOT-10K and 2018 Visual-Object-Tracking (VOT2018). Experimental results show that the proposed algorithm has the success rate index on OTB2015 and GOT-10K better than the benchmark SiamFC by 6.3 percentage points and 4.1 percentage points respectively and runs at 45 frames per second to achieve the real-time tracking. The expected average overlap index of the proposed algorithm surpasses the champion in the VOT2018 real-time challenge, that is the high-performance Siamese with Region Proposal Network (SiamRPN), which verifies the effectiveness of the proposed algorithm.

Reference | Related Articles | Metrics

Select

Mixed-order channel attention network for single image super-resolution reconstruction

YAO Lu, SONG Huihui, ZHANG Kaihua

Journal of Computer Applications 2020, 40 (10): 3048-3053. DOI: 10.11772/j.issn.1001-9081.2020020281

Abstract （273）

PDF （3787KB）（435）

Save

For the current channel attention mechanism used for super-resolution reconstruction, there are problems that the attention prediction destroys the direct corresponding relationship between each channel and its weight and the mechanism only considers the first-order or second-order channel attention without comprehensive consideration of the advantage complementation. Therefore, a mixed-order channel attention network for image super-resolution reconstruction was proposed. First of all, by using the local cross-channel interaction strategy, increase and reduction in channel dimension used by the first-order and second-order channel attention models were changed into a fast one-dimensional convolution with kernel k, which not only makes the channel attention prediction more direct and accurate but makes the resulting model simpler than before. Besides, the improved first and second-order channel attention models above were adopted to comprehensively take the advantages of channel attentions of different orders, thus improving network discrimination. Experimental results on the benchmark datasets show that compared with the existing super-resolution algorithms, the proposed method has the best recovered texture details and high frequency information of the reconstructed images and the Perceptual Indictor (PI) on Set5 and BSD100 datasets are increased by 0.3 and 0.1 on average respectively. It shows that this network is more accurate in predicting channel attention and comprehensively uses channel attentions of different orders, so as to improve the performance.

Reference | Related Articles | Metrics

Select

Video object segmentation method based on dual pyramid network

JIANG Sihao, SONG Huihui, ZHANG Kaihua, TANG Runfa

Journal of Computer Applications 2019, 39 (8): 2242-2246. DOI: 10.11772/j.issn.1001-9081.2018122566

Abstract （571）

PDF （787KB）（213）

Save

Focusing on the issue that it is difficult to segment a specific object in a complex video scene, a video object segmentation method based on Dual Pyramid Network (DPN) was proposed. Firstly, the one-way transmission of modulating network was used to make the segmentation model adapt to the appearance of a specific object, which means, a modulator was learned based on visual and spatial information of target object to modulate the intermediate layers of segmentation network to make the network adapt to the appearance changes of specific object. Secondly, global context information was aggregated in the last layer of segmentation network by different-region-based context aggregation method. Finally, a left-to-right architecture with lateral connections was developed for building high-level semantic feature maps at all scales. The proposed video object segmentation method is a network which is able to be trained end-to-end. Extensive experimental results show that the proposed method achieves results which can be competitive to the results of the state-of-the-art methods using online fine-tuning on DAVIS2016 dataset, and outperforms other methods on DAVIS2017 dataset.

Reference | Related Articles | Metrics

Select

Real-time visual tracking based on dual attention siamese network

YANG Kang, SONG Huihui, ZHANG Kaihua

Journal of Computer Applications 2019, 39 (6): 1652-1656. DOI: 10.11772/j.issn.1001-9081.2018112419

Abstract （546）

PDF （800KB）（414）

Save

In order to solve the problem that Fully-Convolutional Siamese network (SiamFC) tracking algorithm is prone to model drift and results in tracking failure when the tracking target suffers from dramatic appearance changes, a new Dual Attention Siamese network (DASiam) was proposed to adapt the network model without online updating. Firstly, a modified Visual Geometry Group (VGG) network which was more expressive and suitable for the target tracking task was used as the backbone network. Then, a novel dual attention mechanism was added to the middle layer of the network to dynamically extract features. This mechanism was consisted of a channel attention mechanism and a spatial attention mechanism. The channel dimension and the spatial dimension of the feature maps were transformed to obtain the double attention feature maps. Finally, the feature representation of the model was further improved by fusing the feature maps of the two attention mechanisms. The experiments were conducted on three challenging tracking benchmarks:OTB2013, OTB100 and 2017 Visual-Object-Tracking challenge (VOT2017) real-time challenges. The experimental results show that, running at the speed of 40 frame/s, the proposed algorithm has higher success rates on OTB2013 and OTB100 than the baseline SiamFC by the margin of 3.5 percentage points and 3 percentage points respectively, and surpass the 2017 champion SiamFC in the VOT2017 real-time challenge, verifying the effectiveness of the proposed algorithm.

Reference | Related Articles | Metrics

Select

Real-time visual tracking algorithm via channel stability weighted complementary learning

FAN Jiaqing, SONG Huihui, ZHANG Kaihua

Journal of Computer Applications 2018, 38 (6): 1751-1754. DOI: 10.11772/j.issn.1001-9081.2017112735

Abstract （495）

PDF （584KB）（290）

Save

In order to solve the problem of tracking failure of the Sum of template and pixel-wise learners (Staple) tracking algorithm for in-plane rotation and partial occlusion, a simple and effective Channel Stability-weighted Staple (CSStaple) tracking algorithm was proposed.Firstly, a standard correlation filter classifier was employed to detect the response value of each channel. Then, the stability weight of each channel was calculated and multiplied to the weight of each layer to obtain correlation filtering response. Finally, by integrating the response of the color complementary learner, the final response result was obtained, and the location of the maximum value in the response was the tracking result. The proposed algorithm was compared with several state-of-the-art tracking algorithms including Channel and Spatial Reliability Discriminative Correlation Filter (CSR-DCF) tracking, Hedged Deep Tracking (HDT), Kernelized Correlation Filter (KCF) Tracking and Staple. The experimental results show that, the proposed algorithm performs best in the success rate, it is 2.5 percentage points higher and 0.9 percentage points higher than Staple on OTB50 and OTB100 respectively, which proves the effectiveness of the proposed algorithm for target in-plane rotation and partial occlusion.

Reference | Related Articles | Metrics

Select

Face super-resolution via very deep convolutional neural network

SUN Yitang, SONG Huihui, ZHANG Kaihua, YAN Fei

Journal of Computer Applications 2018, 38 (4): 1141-1145. DOI: 10.11772/j.issn.1001-9081.2017092378

Abstract （627）

PDF （890KB）（511）

Save

For multiple scale factors of face super-resolution, a face super-resolution method based on very deep convolutional neural network was proposed; and through experiments, it was found that the increase of network depth can effectively improve the accuracy of face reconstruction. Firstly, a network that consists of 20 convolution layers were designed to learn an end-to-end mapping between the low-resolution images and the high-resolution images, and many small filters were cascaded to extract more textural information. Secondly, a residual-learning method was introduced to solve the problem of detail information loss caused by increasing depth. In addition, the low-resolution face images with multiple scale factors were merged to one training set to enable the network to achieve the face super resolution with multiple scale factors. The results on the CASPEAL test dataset show that the proposed method based on this very deep convolutional neural network has 2.7 dB increasement in Peak Signal-to-Noise Ratio (PSNR), and 2% increasement in structural similarity compared to the Bicubic based face reconstruction method. Compared with the SRCNN method, there is also a greater improvement. as well as a greater improvement in accuracy and visual improvement. It means that deeper network structures can achieve better results in reconstruction.

Reference | Related Articles | Metrics

Select

Unsupervised video segmentation by fusing multiple spatio-temporal feature representations

LI Xuejun, ZHANG Kaihua, SONG Huihui

Journal of Computer Applications 2017, 37 (11): 3134-3138. DOI: 10.11772/j.issn.1001-9081.2017.11.3134

Abstract （537）

PDF （1045KB）（471）

Save

Due to random movement of the segmented target, rapid change of background, arbitrary variation and shape deformation of object appearance, in this paper, a new unsupervised video segmentation algorithm based on multiple spatial-temporal feature representations was presented. By combination of salient features and other features obtained from pixels and superpixels, a coarse-to-fine-grained robust feature representation was designed to represent each frame in a video sequence. Firstly, a set of superpixels was generated to represent foreground and background in order to improve computational efficiency and get segmentation results by graph-cut algorithm. Then, the optical flow method was used to propagate information between adjacent frames, and the appearance of each superpixel was updated by its non-local sptatial-temporal features generated by nearest neighbor searching method with efficient K-Dimensional tree (K-D tree) algorithm, so as to improve robustness of segmentation. After that, for segmentation results generated in superpixel-level, a new Gaussian mixture model based on pixels was constructed to achieve pixel-level refinement. Finally, the significant feature of image was introduced, as well as segmentation results generated by graph-cut and Gaussian mixture model, to obtain more accurate segmentation results by voting scheme. The experimental results show that the proposed algorithm is a robust and effective segmentation algorithm, which is superior to most unsupervised video segmentation algorithms and some semi-supervised video segmentation algorithms.

Reference | Related Articles | Metrics